39 research outputs found

    Differentially Private Exponential Random Graphs

    Full text link
    We propose methods to release and analyze synthetic graphs in order to protect privacy of individual relationships captured by the social network. Proposed techniques aim at fitting and estimating a wide class of exponential random graph models (ERGMs) in a differentially private manner, and thus offer rigorous privacy guarantees. More specifically, we use the randomized response mechanism to release networks under ϵ\epsilon-edge differential privacy. To maintain utility for statistical inference, treating the original graph as missing, we propose a way to use likelihood based inference and Markov chain Monte Carlo (MCMC) techniques to fit ERGMs to the produced synthetic networks. We demonstrate the usefulness of the proposed techniques on a real data example.Comment: minor edit

    Vertex Clustering in Random Graphs via Reversible Jump Markov Chain Monte Carlo

    Get PDF
    Networks are a natural and effective tool to study relational data, in which observations are collected on pairs of units. The units are represented by nodes and their relations by edges. In biology, for example, proteins and their interactions, and, in social science, people and inter-personal relations may be the nodes and the edges of the network. In this paper we address the question of clustering vertices in networks, as a way to uncover homogeneity patterns in data that enjoy a network representation. We use a mixture model for random graphs and propose a reversible jump Markov chain Monte Carlo algorithm to infer its parameters. Applications of the algorithm to one simulated data set and three real data sets, which describe friendships among members of a University karate club, social interactions of dolphins, and gap junctions in the C. Elegans, are given

    Sequential design of computer experiments for the estimation of a probability of failure

    Full text link
    This paper deals with the problem of estimating the volume of the excursion set of a function f:Rd→Rf:\mathbb{R}^d \to \mathbb{R} above a given threshold, under a probability measure on Rd\mathbb{R}^d that is assumed to be known. In the industrial world, this corresponds to the problem of estimating a probability of failure of a system. When only an expensive-to-simulate model of the system is available, the budget for simulations is usually severely limited and therefore classical Monte Carlo methods ought to be avoided. One of the main contributions of this article is to derive SUR (stepwise uncertainty reduction) strategies from a Bayesian-theoretic formulation of the problem of estimating a probability of failure. These sequential strategies use a Gaussian process model of ff and aim at performing evaluations of ff as efficiently as possible to infer the value of the probability of failure. We compare these strategies to other strategies also based on a Gaussian process model for estimating a probability of failure.Comment: This is an author-generated postprint version. The published version is available at http://www.springerlink.co

    Sequential importance sampling for bipartite graphs with applications to likelihood-based inference

    Get PDF
    The ability to simulate graphs with given properties is important for the analysis of social networks. Sequential importance sampling has been shown to be particularly effective in estimating the number of graphs adhering to fixed marginals and in estimating the null distribution of test statistics. This paper builds on the work of Chen et al. (2005), providing an intuitive explanation of the sequential importance sampling algorithm as well as several examples to illustrate how the algorithm can be implemented for bipartite graphs. We examine the performance of sequential importance sampling for likelihood-based inference in comparison with Markov chain Monte Carlo, and find little empirical evidence to suggest that sequential importance sampling outperforms Markov chain Monte Carlo, even for sparse graphs or graphs with skewed marginals

    networksis: A package to simulate bipartite graphs with fixed marginals through sequential importance sampling

    Get PDF
    The ability to simulate graphs with given properties is important for the analysis of social networks. Sequential importance sampling has been shown to be particularly effective in estimating the number of graphs adhering to fixed marginals and in estimating the null distribution of graph statistics. This paper describes the networksis package for R and how its simulate and simulate_sis functions can be used to address both of these tasks as well as generate initial graphs for Markov chain Monte Carlo simulations

    Early entry to fatherhood estimated from men's and women's survey reports in combination

    Get PDF
    While underreporting of fatherhood is a widely acknowledged problem, satisfactory methods for its correction have yet to be developed. In the present study, we investigate methods of correction that are specific to marital status at the time of the birth and at the time of retrospective reporting, focusing on fatherhood under age 30. Matched women’s and men’s survey reports of births, in each case reported by marital status and age of the father, form the basis for our corrections. Male age-specific fertility rates are estimated from these survey data by using women’s reports for the births numerator and men’s reports for the exposed-years denominator. These are shown to match well to male age specific fertility rates estimated from population data sources. When marital births in the men’s and women’s survey data are differentiated by whether the birth is within a current or previous marriage, only for births in previous marriages is there a male reporting deficit. Further, this deficit is completely explained by under-representation of men’s exposed years in previous marriages. We find no evidence of underreporting of births for those exposed years. These results are used to develop a constrained maximum likelihood estimator in which male fertility is constrained by age and marital status, with a focus on correcting for underreported non-marital fertility

    A framework for the comparison of maximum pseudo likelihood and maximum likelihood estimation of exponential family random graph models

    Get PDF
    The statistical modeling of social network data is difficult due to the complex dependence structure of the tie variables. Statistical exponential families of distributions provide a flexible way to model such dependence. They enable the statistical characteristics of the network to be encapsulated within an exponential family random graph (ERG) model. For a long time, however, likelihood-based estimationwas only feasible for ERG models assuming dyad independence. For more realistic and complex models inference has been based on the pseudo-likelihood. Recent advances in computational methods have made likelihood-based inference practical, and comparison of the different estimators possible. In this paper, we present methodology to enable estimators of ERG model parameters to be compared. We use this methodology to compare the bias, standard errors, coverage rates and efficiency of maximum likelihood and maximum pseudo-likelihood estimators.We also propose an improved pseudo-likelihood estimation method aimed at reducing bias. The comparison is performed using simulated social network data based on two versions of an empirically realistic network model, the first representing Lazega’s law firm data and the second a modified version with increased transitivity. The framework considers estimation of both the natural and the mean-value parameters. The results clearly showthe superiority of the likelihood-based estimators over those based on pseudolikelihood, with the bias-reduced pseudo-likelihood out-performing the general pseudo-likelihood. The use of the mean-value parameterization provides insight into the differences between the estimators and when these differences will matter in practice.
    corecore